Search CORE

126 research outputs found

Shared Arrangements: practical inter-query sharing for streaming dataflows

Author: Lattuada Andrea
McSherry Frank
Roscoe Timothy
Schwarzkopf Malte
Publication venue
Publication date: 01/06/2020
Field of study

Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis

arXiv.org e-Print Archive

Repository for Publications and Research Data

The Last Plant Standing - Ecosystem assessment and Response to Human Disturbance in the Gordon Natural Area, West Chester University, Chester County, Pennsylvania

Author: McGeehin Michael
McSherry Frank
Mohn Katie
Noble Kristen
Stabs Rob
Publication venue: Digital Commons @ West Chester University
Publication date: 01/12/2004
Field of study

Digital Commons @ West Chester University

A simple and practical algorithm for differentially private data release

Author: Hardt Moritz
Ligett Katrina
McSherry Frank
Publication venue
Publication date: 01/01/2010
Field of study

We present new theoretical results on differentially private data release useful with respect to any target class of counting queries, coupled with experimental results on a variety of real world data sets. Specifically, we study a simple combination of the multiplicative weights approach of [Hardt and Rothblum, 2010] with the exponential mechanism of [McSherry and Talwar, 2007]. The multiplicative weights framework allows us to maintain and improve a distribution approximating a given data set with respect to a set of counting queries. We use the exponential mechanism to select those queries most incorrectly tracked by the current distribution. Combing the two, we quickly approach a distribution that agrees with the data set on the given set of queries up to small error. The resulting algorithm and its analysis is simple, but nevertheless improves upon previous work in terms of both error and running time. We also empirically demonstrate the practicality of our approach on several data sets commonly used in the statistical community for contingency table release

CiteSeerX

Caltech Authors

Foundations of Differential Dataflow

Author: Abadi Martín
McSherry Frank
Plotkin Gordon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Abstract. Differential dataflow is a recent approach to incremental computation that relies on a partially ordered set of differences. In the present paper, we aim to develop its foundations. We define a small pro-gramming language whose types are abelian groups equipped with linear inverses, and provide both a standard and a differential denotational se-mantics. The two semantics coincide in that the differential semantics is the differential of the standard one. Möbius inversion, a well-known idea from combinatorics, permits a systematic treatment of various operators and constructs.

CiteSeerX

Crossref

Edinburgh Research Explorer

Fast matrix computations for pair-wise and column-wise commute times and Katz scores

Author: Andersen [Andersen et al. 06] Reid
Boldi [Boldi et al. 11] Paolo
Chung [Chung et al. 03] Fan
Davis [Davis and Rabinowitz 84] P. J.
Golub [Golub and Meurant 10] Gene H.
Lanczos [Lanczos 50] Cornelius
Lanczos [Lanczos 53] Cornelius
Liben-Nowell [Liben-Nowell and Kleinberg 03] David
McSherry [McSherry 05] Frank
Mihail [Mihail and Papadimitriou 02] Milena
Varga [Varga 62] R. S.
Publication venue: 'Informa UK Limited'
Publication date: 19/04/2011
Field of study

We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and Katz scores from a node to all other nodes in the graph. Here, our approach for the commute times is based on a variation of the conjugate gradient algorithm, and it provides an estimate of all the diagonals of the inverse of a matrix. Our technique for the Katz scores is based on exploiting an empirical localization property of the Katz matrix. We adopt algorithms used for personalized PageRank computing to these Katz scores and theoretically show that this approach is convergent. We evaluate these methods on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our results show that our pair-wise commute time method and column-wise Katz algorithm both have attractive theoretical properties and empirical performance.Comment: 35 pages, journal version of http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for publication. Please see http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for supplemental code

arXiv.org e-Print Archive

Crossref